Rescoring Sequence-to-Sequence Models for Text Line Recognition with CTC-Prefixes

نویسندگان

چکیده

In contrast to Connectionist Temporal Classification (CTC) approaches, Sequence-To-Sequence (S2S) models for Handwritten Text Recognition (HTR) suffer from errors such as skipped or repeated words which often occur at the end of a sequence. this paper, combine best both we propose use CTC-Prefix-Score during S2S decoding. Hereby, beam search, paths that are invalid according CTC confidence matrix penalised. Our network architecture is composed Convolutional Neural Network (CNN) visual backbone, bidirectional Long-Short-Term-Memory-Cells (LSTMs) encoder, and decoder Transformer with inserted mutual attention layers. The confidences computed on encoder while only used character-wise We evaluate setup three HTR data sets: IAM, Rimes, StAZH. On achieve competitive Character Error Rate (CER) 2.95% when pretraining our model synthetic including character-based language contemporary English. Compared other state-of-the-art requires about 10–20 times less parameters. Access shared implementations via link GitHub .

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Sequence to Sequence Training of CTC-RNNs with Partial Windowing

Connectionist temporal classification (CTC) based supervised sequence training of recurrent neural networks (RNNs) has shown great success in many machine learning areas including endto-end speech and handwritten character recognition. For the CTC training, however, it is required to unroll (or unfold) the RNN by the length of an input sequence. This unrolling requires a lot of memory and hinde...

متن کامل

Sequence to sequence learning for unconstrained scene text recognition

In this work we present a state-of-the-art approach for unconstrained natural scene text recognition. We propose a cascade approach that incorporates a convolutional neural network (CNN) architecture followed by a long short term memory model (LSTM). The CNN learns visual features for the characters and uses them with a softmax layer to detect sequence of characters. While the CNN gives very go...

متن کامل

Optical Music Recognition with Convolutional Sequence-to-Sequence Models

Optical Music Recognition (OMR) is an important technology within Music Information Retrieval. Deep learning models show promising results on OMR tasks, but symbol-level annotated data sets of sufficient size to train such models are not available and difficult to develop. We present a deep learning architecture called a Convolutional Sequence-to-Sequence model to both move towards an end-to-en...

متن کامل

A Comparison of Sequence-to-Sequence Models for Speech Recognition

In this work, we conduct a detailed evaluation of various allneural, end-to-end trained, sequence-to-sequence models applied to the task of speech recognition. Notably, each of these systems directly predicts graphemes in the written domain, without using an external pronunciation lexicon, or a separate language model. We examine several sequence-to-sequence models including connectionist tempo...

متن کامل

Sequence-to-Sequence RNNs for Text Summarization

In this work, we cast text summarization as a sequence-to-sequence problem and apply the attentional encoder-decoder RNN that has been shown to be successful for Machine Translation (Bahdanau et al. (2014)). Our experiments show that the proposed architecture significantly outperforms the state-of-the art model of Rush et al. (2015) on the Gigaword dataset without any additional tuning. We also...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Lecture Notes in Computer Science

سال: 2022

ISSN: ['1611-3349', '0302-9743']

DOI: https://doi.org/10.1007/978-3-031-06555-2_18